Notes � animal behaviour IV, a-life

Anastasoff, S. 1999. Evolving Mutation Rates for the Self-Optimisation of Genetic Algorithms. In Advances in artificial life� : 5th� European Conference, ECAL'99, Lausanne, Switzerland, September, 1999 : proceedings, Floreano, D., Nicoud, J-D., Mondada, F. (eds.), pp. 74-78, Springer Verlag, Berlin.

Taddei F., Radman M., Maynard Smith J., Toupance B., Gouyon PH., Godelle B. 1997. Role of mutator alleles in adaptive evolution, Nature 387: (6634) 700-702.

Sniegowski, P.D., Ferrish, P.J., Lenski, R.E. 1997. Evolution of high mutation rates in experimental population of E. coli. Nature 387: (6634) 703-705.

Hart, W. E., Kammeyer, T. E., Belew, R. K.1994. The role of development in genetic algorithms. In D. Whitley and M. Vose, editors, Foundations of Genetic Algorithms III. Morgan Kauffman.

Things to keep in mind are the dynamics of the process, the representation of information, the way in which variation is introduced (mutation, recombination) and the relationship between genotype and phenotype. If you can�t get hold of any of the references let me know, and I�ll give you a copy.

Reading � Levy

Reading � Mitchell

Rechensberg (1965, 1973) � Evolutionsstrategie

evolutionary computation = evolution strategies, evolutionary programming and genetic algorithms

John Holland (1960s, Michigan) - genetic algorithms

n=16 sorting networks

W. Daniel Hillis (1990, 1992) originally managed it in 65, the number that Bose and Nelson managed in 1962, but Green had managed 60 in 1969

then employed parasites (in the form of test cases) �/span> arms race, managed 61

schema theory

traditional theory (Holland 1975): �assumes that, at a very general level of description, GAs work by discovering, emphasising and recombining good �building blocks� of solutions in a highly parallel fashion. the idea here is that good solutions tend to be made up of good building blocks � combinations of bit values that confer higher fitness on the string in which they are present�

Holland (1975) introduced schemas �to formalise the informal notion of �building blocks�. a schema is a set of bit strings that can be described by a template made up on ones, zeros and asterisks, the asterisks representing� wild cards. For example, the schema H = 1 **** 1 respresent the set of all 6-bit strings that begin and end with 1 (where H = hyperplane, Goldberg (1989), because schemas define hyperplanes, planes of various dimensions � in the l-dimensional space of length-l bit strings). the strings that fit this template are said to be instances of H. the schema H is said to have two defined bits (non-asterisks) or, equivalently, to be of order 2. its defining length (the distance between its outermost defined bits) is 5.�

although �not every possible subset of the set of length-l bit strings can be described as a schema � a central tenet of traditional GA theory is that schemas are implicitly the building blocks that the GA processes effectively under the operators of selection, mutation and single-point crossover� � Building Block Hypothesis

�in evaluating a population of n strings, the GA is implicitly estimating the average fitnesses of all schemas that are present in the population, and increasing or decreasing their representation according to the Schema Theorem� � implicit parallelism. �the effect of selection is to gradually bias the sampling procedure towards instances of schemas whose fitness is estimated to be above average�.

�mutation provides an �insurance policy� against fixation�.

this particular schema theory only works for single point crossover operators though.

biologically-inspired computation

movement of biologically-inspired computation - massively parallel, adaptive, learn, creative

vs �gedanken experiments� or �idea models� (Roughgarden et al. 1996)

Structure

Terms

evolution, fitness function, natural selection

genetic algorithm, a-life (evolutionary computation, parallelism)

recombination, mutation, crossover, inversion (selection operator)

genotype + phenotype, representation, maturation

gene, allele, genome, chromosome

local search, speciation/sub-populations, migration

Similarities

complex, non-linear systems

apparent teleology, emergent

Differences

genotype - genetic code/mechanism, selection operator, reproduction

discrete vs continuous alleles � alphabet

genome comprised of a single chromosome

multi-bit alleles

interaction between genes � genetic regulation �/span> flexible

Gray encoding

the genetic mechanism itself can't evolve, isn�t part of the phenotype, stuck with the genes that the programmer chooses???

recombination/crossover, mutation, inversion

haploid vs diploid

1 or more cross-overs, break within alleles

asexual, both parents contribute genetic code

generations are in discrete, non-overlapping time steps

inversions, gene doubling, deletion � dominance, translocation, sexual differentiation and introns

phenotype + ecology

maturation = how get from genotype � phenotype??? simplicity of the manner in which the genotype expresses itself in the phenotype � rather than being used like a recipe, the genotype space is being directly translated into the phenotype space, using the genes directly in the solution???

the fitness function is artificial, as opposed to being natural and self-selecting (endogenous), like survival value � needs an algorithm for scoring (or at least comparing) fitness

interactions among population members (coevolution, symbiosis and competition) rather than independently selected on the basis of individual fitness

complexity and harshness of environment

programmer can impose top-down restraints, because he often knows a little about the shape of the solution space (e.g. in a very simple example, if the fitness function = getting close to a fixed string like a British telephone number, then the programmer can eliminate any phenotypes that don�t start with �0�)

fitness proportionate selection � good???

misc

no acts of God

virtual rather than chemical - it's the underlying computation rather than the physiology being modelled - and only the computation that we assume is important (e.g. assumptions about the importance of mutation)

just the complexity and number of genes and the size of the organism (in multi-cellular organisms, anyway)

other three paradigms that nature employs: ontogeny, learning and culture

size of evolutionary time

fixed population sizes

Points

biological plausibility � seemingly more powerful new ideas vs the single most powerful (and the original) solution we have (nature) � researchers are keen to try different techniques, some of which may actually be better than the particular ones employed in nature, or better suited to the problem

the lack of a phenotypic level may prove short-sighted � evolution has deliberately added this extra disconnection, probably because it makes things more powerful and flexible. simply treating the genes as parameters, or simpler still, the weights in neural networks, may not be enough � we need full-scale embodied behaviour. although this will often prove significantly more computationally expensive, it allows the introduction of ontogenetic and cultural factors, as well as just evolved and learnt

fitness landscape � these aren�t really fixed, because you can�t really define the fitness of an organism independent of the fitness of the other organisms

I'm surprised there isn�t a more pronounced schism between biologically-plausible and more artificial approaches. for instance, I'd definitely opt for a 4-alphabet allele� or is there really bugger all difference between a binary and quaternary alphabet???

lots of GAs are all single populations, or at best speciated into sub-populations. what about predators and trophic levels??? (see Levy)

aren�t there very very abstracted parallel search techniques??? assuming the fitness function is easy, it�s all about choosing where to search in the space, and navigating your way around within it.

can look at evolution in terms of information theory, e.g. the amount of information contained in a population, �how evolution processes that information to create structures that lead to higher fitness�

there are some areas where it�s difficult to know whether we�re doing things differently or have added something to nature, since we really don�t have a clue what exactly nature is doing!

you definitely get Selfish Gene effects in GAs, which is very evolutionarily real (also e.g. Baldwin effect)

limitations on allowable fitness function

requires an algorithm for scoring (or at least comparing) fitness

ideally able to break the problem up into smaller problems

Definitions

the genotype gives rise, under fetal and later devleopment, to the organism�s phenotype � its physical and mental characteristics, such as eye colour, height, brain size and intelligence (Mitchell)

organisms whose chromosomes are arrayed in pairs are called diploid; organisms whose chromosomes are unpaired are called haploid. most sexually-reproducing species are diploid (Mitchell)

during sex, recombination occurs: in each parent, genes are exchanged between each pair of chromosomes to form a gamete (a single chromosome) and then gametes from the two parents pair up to create a full set of diploid chromosomes

in haploid sexual reproduction, genes are exchanged between the two parents� single-strand chromosomes

nucleotides = elementary bits of DNA

fitness is typically defined as the probability that the organism will live to reproduce (viability) or as a function of the number of offspring the organism has (fertility)

in GAs, the term chromosome typically refers to a candidate solution to a problem often encoded as a bit string

the �genes� are either single bits or short blocks of adjacent bits that encode a particular element of the candidate solution (e.g. a particular parameter)

most methods called �GAs� have at least the following elements in common: populations of chromosomes; selection according to fitness; crossover to produce new offspring; and random mutation of new offspring

parallel population-based search with stochastic selection of many invidividuals, stochastic crossover and mutation

each chromosome/organism can be thought of as a point in the search space of candidate solutions

the GA processes populations of chromosomes (they�re all located within that population�s fitness landscape, right???)

selection of parent chromosomes �with replacement� means that the same chromosome can be selected more than once to become a parent

fitness proportionate selection, in which the number of itmes an individual is expected to reproeduce is equal to its fitness divided by the average of fitnesses in the population (equivalent to �viability selection�) = roulette wheel sampling (Goldberg 1989)

in AI, weak methods = general, strong methods = specialised

Prisoner�s Dilemma = Merill Flood, Melvin Dresher, 1950s, Rand corporation

a population is a group of organisms that interbreed and share a gene pool.

gene pool is all the genes in a population at a particular time

gene is a sequence of nucleotides coding for a protein, or part of a protein

a protein is a molecule made up of a sequence of amino acidds. many of the important molecules in a living things are proteins (enzymes, for example)

a pseudogene is a sequence of nucleotides in the DNA that resembles a gene but is non-functional for some reason

a nucleotide is a unit building block of DNA and RNA; a nucleotide consists of a sugar and phosphate backbone with a base attached

diploid � having two sets of genes and two sets of chromosomes (one from the father, one from the mother). many common species, including humans are diploid (rather than haploid or polyploid)

intron �the nucleotide sequences of some genes consist of parts that code for amino acids, with other parts that do not code for amino acids interspersed among them. The interspersed non-coding parts, which are not translated, are called introns; the coding parts are called extrons.

dominance � an allele (A) is dominant if the phenotype of the heterozygotes Aa is the samea s the homozygote AA. The allele a does not influence the heterozygote�s phenotype and is called recessive. An alllel nay be partly, rather than fully, dominant; then the heterozygous phenotype is nearer to, rather than identical with, the homozygote of the dominant allele

hetero/homozygote???

Questions

Factual

Koza, tree chromosomes, LISP S-expressions etc.???

�Absolute Ignorance� � which contemporary of Darwin�s???

evolutionary bias???

are random chunks sometimes inverted??? if so, doesn�t that screw with things, because there�s no reason for something that works well for one gene to be good for the one next door, surely??? is it because the gene-level is more like numbers than an alphabet, because one string of values is as valid as another, whereas letters have meanings � it�s all about the genotype�phenotype translation code�???

what are schemas??? are they still used???

genes as functional blocks of DNA � the alleles aren�t just single-bit values, but multi-bit combinations???

in diploid species, are the two chromosomes in each pair identical???

do chunks get shifted about within the chromosome during recombination???

what difference between multiple/single large chromosomes???

is there a case when a genetic algorithm which deliberately flaunts biological plausibility has been significantly more successful than one which adheres to biological plausibility???

what do the introns do, if anything???

do viruses have DNA???

can recombination work with just one parent/haploid as well as two parents/diploid???

did bees etc. change their evolutionary mechanism to suit the tasks of honey-making etc., or did their evolutionary mechanism alter, thereby happening to also conveniently change their society

would cross-chromosomal schemas add anything to a large template on a single combined chromosome???

are Building Blocks (Holland) the same as genes???

could a universal Turing machine compute the uncomputable??? which of the processes by which we usually define life are computable???

Confusion + consideration

what parameters do you need/want to specify a NN???

how do NNs and GAs differ in terms of problems they�re applicable to??? how much overlap???

how much are we limited in terms of our problem space by the necessity of choosing genes???

what about second order GAs/NNs???

is the Anastoasoff paper with a non-static mutation rate sort of like a momentum (as in learning rules in NNs)???

can you get second-order, selfish gene effects???

can I have genomes/chromosomes of entirely dynamic size??? if so, what sort of reproductive/recombination mechanism can I employ???

what do recombinations and inversions look like in vector space???

GAs, evolution strategies, evolutionary programming, hill-climbing, simulated annealing and tabu search are examples of general �search for solutions� methods � what are they all??? what about branch-and-bound and A*???

when a GA searches a vast search space incredibly quickly what is it doing � is that it starts of with the population scattered absolutely all over, then floods the bottom 90% and does local searches from each enclave from then on, weeding out lower enclaves, as well as those in highish enclaves but tiny pocks???

does it make a difference that GAs are usually binary???

any given genotype is an instance of many many schemas (2^l, right???), and so isn�t the GA doing a bit of constraint satisfaction between them all???

why do animals� bodies get bigger (Cope�s rule)??? why do we evolve to more complexity???

what other gene transformation mechanisms can we imagine � what about non-local effects (the back propagation equivalent)??? can we not use calculus to figure out how to traverse towarsd the minimum, or is that a serial/very slow technique???

what were the 2 things about evolution that von Neumann used to stress???

does the phenotype need to be embodied in order for the genetic mechanism itself to evolve???

can you have nested genes???

is the environment complex and harsh for an amoeba??? it�s much simpler, just with a higher element of unpredictable apparent randomness???

Evolutionary computation is a term used to include evolution strategies and evolutionary programming as well as genetic algorithms.

difference at genotype level : genes = multi-bit alleles